Add support for extended resource definition in GCE MIG template #62

zaymat · 2022-10-06T20:44:17Z

Signed-off-by: Mayeul Blanzat [email protected]

Which component this PR applies to?

cluster-autoscaler

What type of PR is this?

/kind feature

What this PR does / why we need it:

This PR adds support for extended resources in GCE MIG template.

Today, the cluster-autoscaler on GCE only supports scaling decisions based the following resources: CPU, Memory, EphemeralStorage and GPU. However, Kubernetes allows defining an arbitrary number of resources through the Extended Resource API. This can be useful when your instances have special resources that you want to share between your pods. One example that I can think of is network bandwidth, which cannot be requested by pods except through extended resources.
Some other implementations of the CloudProvider interface, like AWS or Azure, already support scaling decisions based on extended resources.

This PR adds the possibility to define extended resources for a node group on GCE, so that the cluster-autoscaler can account for them when taking scaling decisions. This is done through the extended_resources key inside the AUTOSCALER_ENV_VARS variable set on a MIG template.

Example:

AUTOSCALER_ENV_VARS: kube_reserved=<...>;<...>;extended_resources=foo=10,bar=1M,foobar=2G

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

…e-fix GCE: Always add boot disk annotations to templates

feat: use non-root user for base-image

Adds a new flag `--balance-label` which allows users to balance between node groups exclusively via labels. This gives users the flexibility to specify the similarity logic themselves when --balance-similar-node-groups is in use.

…le-version update cloud-provider-azure version for azure imports

…oposal CA expander plugin proposal

…on_Doc Deduplicate Migration Doc from README.

…rom cloud provider that are still registered within Kubernetes"

IsCustomMachine didn't take machine types with family prefix (e.g. n2-custom-2-2816) into account.

Revert "Adding support for identifying nodes that have been deleted from cloud provider that are still registered within Kubernetes"

CA: GCE: implement GetMachineFamily, fix IsCustomMachine

This ensured that access to replicas during scale down operations were never stale by accessing the API server kubernetes#3104. This honoured that behaviour while moving to unstructured client kubernetes#3312. This regressed that behaviour while trying to reduce the API server load kubernetes#4443. This put back the never stale replicas behaviour at the cost of loading back the API server kubernetes#4634. Currently on e.g a 48 minutes cluster it does 1.4k get request to the scale subresource. This PR tries to satisfy both non stale replicas during scale down and prevent the API server from being overloaded. To achieve that it lets targetSize which is called on every autoscaling cluster state loop from come from cache. Also note that the scale down implementation has changed https://github.com/kubernetes/autoscaler/commits/master/cluster-autoscaler/core/scaledown.

…-updater-flags chore: Document params for VPA recommender & updater (similar to CA's FAQs)

…as-exoscale-documentation exoscale provider: Update cluster autoscaler documentation

…entation Fix/examples documentation

this change removes some unused values and adjusts the names in the unit tests to better reflect usage.

cleanup unused constants in clusterapi provider

Signed-off-by: Vishal Anarse <[email protected]>

Fix race condition in scale down test

…mple-spec Update the example spec of civo cloudprovider

Support for DirectX Devices

Updated the golang version for the GitHub workflows.

Containers in recommendation can be different from recommendations in pod: - A new container can be added to a pod. At first there will be no recommendation for the container - A container can be removed from pod. For some time recommendation will contain recommendation for the old container - Container can be renamed. Then there will be recommendation for container under its old name. Add tests for what VPA does in those situations.

Containers in recommendation can be different from recommendations in pod: - A new container can be added to a pod. At first there will be no recommendation for the container - A container can be removed from pod. For some time recommendation will contain recommendation for the old container - Container can be renamed. Then there will be recommendation for container under its old name. Add tests for what VPA does in those situations, when limit range exists.

Remove units for default boot disk size

add example for multiple recommenders

…pod-recommendation-mismatch E2e test admission pod recommendation mismatch

[gce]: skip instances on validation error

fix typo

CA - AWS - Instance List Update 2022-09-16

…ls-insecure magnum: add an option to create insecure TLS connections

…y_and_Preemption_links Corrected the links for Priority in k8s API and Pod Preemption in k8s.

Fixed gofmt error.

Don't break scale up with priority expander config

This commit adds the possibility to define extended resources for a node group on GCE, so that the cluster-autoscaler can account for them when taking scaling decisions. This is done through the `extended_resources` key inside the AUTOSCALER_ENV_VARS variable set on a MIG template. Signed-off-by: Mayeul Blanzat <[email protected]>

…add more tests * Malformed extended resource definition should not fail the template building function. Instead, log the error and ignore extended resources * Remove useless existence check * Add tests around the extractExtendedResourcesFromKubeEnv function * Add a test case to verify that malformed extended resource definition does not fail the template build function Signed-off-by: Mayeul Blanzat <[email protected]>

k8s-ci-robot and others added 30 commits July 5, 2022 03:04

Merge pull request kubernetes#5004 from yaroslava-serdiuk/node-temlat…

042badc

…e-fix GCE: Always add boot disk annotations to templates

Deduplicate Migration Doc from README.

d433cd3

Merge pull request kubernetes#4728 from moolen/feature/nonroot

0fcbac8

feat: use non-root user for base-image

Allow balancing by labels exclusively

1b98b38

Adds a new flag `--balance-label` which allows users to balance between node groups exclusively via labels. This gives users the flexibility to specify the similarity logic themselves when --balance-similar-node-groups is in use.

update cloud-provider-azure version for azure imports

c77a4ac

Merge pull request kubernetes#5011 from gandhipr/bug-update-vendor-fi…

0c3e9d1

…le-version update cloud-provider-azure version for azure imports

Merge pull request kubernetes#4134 from airbnb/es--expander-plugin-pr…

f990344

…oposal CA expander plugin proposal

Merge pull request kubernetes#5005 from Shubham82/deduplicate-Migrati…

01ef7c1

…on_Doc Deduplicate Migration Doc from README.

Set container image to upstream

caf3558

Set tolerations and node selector to schedule on control-plane

0cd5f27

Update deployment manifest on control-plane

ed1c8f8

Add required IAM operations documentation

40e950f

Add empty line at end of file

3ec69a0

Comment scale-down test delays

6660026

Initial vendoring

6ff006f

Update imports and add instructions for vendored code

c37599c

Update go.mod with new net dependency

32f10eb

Update verify scripts

aac78b6

chore: Documenting flags for VPA recommender & updater

036826d

Update/Fix

86463f9

Fix storage

9ef0bbf

Revert "Adding support for identifying nodes that have been deleted f…

66bfe55

…rom cloud provider that are still registered within Kubernetes"

CA: GCE: implement GetMachineFamily, fix IsCustomMachine

6ba8d27

IsCustomMachine didn't take machine types with family prefix (e.g. n2-custom-2-2816) into account.

Merge pull request kubernetes#5023 from x13n/revert-4896-master

4b97a77

Revert "Adding support for identifying nodes that have been deleted from cloud provider that are still registered within Kubernetes"

Merge pull request kubernetes#5024 from towca/jtuznik/families

db5e2f2

CA: GCE: implement GetMachineFamily, fix IsCustomMachine

Merge pull request kubernetes#5021 from gsweene2/vpa-document-rec-and…

d8b9847

…-updater-flags chore: Document params for VPA recommender & updater (similar to CA's FAQs)

Merge pull request kubernetes#5017 from PhilippeChepy/pchepy/update-c…

9587e17

…as-exoscale-documentation exoscale provider: Update cluster autoscaler documentation

Merge pull request kubernetes#5003 from Andrius521/fix/examples-docum…

2993b2e

…entation Fix/examples documentation

Fixed the Hyperlinks of HPA.

c275757

elmiko and others added 21 commits September 29, 2022 14:22

cleanup unused constants in clusterapi provider

5c9cc27

this change removes some unused values and adjusts the names in the unit tests to better reflect usage.

Merge pull request kubernetes#5222 from elmiko/capi-cleanup

728bea6

cleanup unused constants in clusterapi provider

Update the example spec of civo cloudprovider

030a7f5

Signed-off-by: Vishal Anarse <[email protected]>

Fix race condition in scale down test

a99294d

Merge pull request kubernetes#5227 from yaroslava-serdiuk/batch-test

9cae42c

Fix race condition in scale down test

add example for multiple recommenders

3d9ab55

Merge pull request kubernetes#5226 from vishalanarase/civo-update-exa…

cdf8406

…mple-spec Update the example spec of civo cloudprovider

Merge pull request kubernetes#5209 from helio/gpu-processor-directx

5417153

Support for DirectX Devices

Merge pull request kubernetes#5205 from Shubham82/update_golang_version

7db0d94

Updated the golang version for the GitHub workflows.

Remove units for default boot disk size

e75a769

Merge pull request kubernetes#5233 from yaroslava-serdiuk/boot-disk

078a6e0

Remove units for default boot disk size

Merge pull request kubernetes#5231 from matthyx/doc

9a92bcf

add example for multiple recommenders

Merge pull request kubernetes#5232 from jbartosik/e2e-test-admission-…

eded7ec

…pod-recommendation-mismatch E2e test admission pod recommendation mismatch

Merge pull request kubernetes#5213 from Freyert/gce-409-skip

f63315c

[gce]: skip instances on validation error

Merge pull request kubernetes#5196 from JamesClonk/master

ee09474

fix typo

Merge pull request kubernetes#5193 from juanitomint/master

ddf0fe0

CA - AWS - Instance List Update 2022-09-16

Merge pull request kubernetes#5210 from antonkurbatov/bugfix/magnum-t…

c65a3a3

…ls-insecure magnum: add an option to create insecure TLS connections

Merge pull request kubernetes#5167 from Shubham82/Correct_Pod_Priorit…

4ff4903

…y_and_Preemption_links Corrected the links for Priority in k8s API and Pod Preemption in k8s.

Fixed gofmt error.

d1f2acf

zaymat force-pushed the mayeul/add-extended-resource-support-in-gce branch 2 times, most recently from 8ec94a5 to 17f8faf Compare October 10, 2022 15:09

k8s-ci-robot and others added 3 commits October 10, 2022 19:25

Merge pull request kubernetes#5241 from Shubham82/update-gofmt

e206ae2

Fixed gofmt error.

Don't break scale up with priority expander config

2ee8023

Merge pull request kubernetes#5246 from x13n/priority-expander

ae9ed65

Don't break scale up with priority expander config

zaymat force-pushed the mayeul/add-extended-resource-support-in-gce branch from 17f8faf to 15bb504 Compare October 11, 2022 10:03

zaymat force-pushed the mayeul/add-extended-resource-support-in-gce branch from 15bb504 to e286a95 Compare October 11, 2022 12:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for extended resource definition in GCE MIG template #62

Add support for extended resource definition in GCE MIG template #62

zaymat commented Oct 6, 2022 •

edited

Loading

Add support for extended resource definition in GCE MIG template #62

Are you sure you want to change the base?

Add support for extended resource definition in GCE MIG template #62

Conversation

zaymat commented Oct 6, 2022 • edited Loading

Which component this PR applies to?

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

zaymat commented Oct 6, 2022 •

edited

Loading